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ABSTRACT 
We use Kolmogorov’s algorithmic approach to information theory to define a concept of inde- 
pendence of sequences, or equivalently, the boundedness of their mutual information. This concept 
is applied to probability theory, intuitionistic logic, and the theory of algorithms. For each case, we 
study the advantage of accepting the postulate that the objects studied by the theory are independent 
of any sequence determined by a mathematical property. 
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0. PRELIMINARIES 


0.1 INTRODUCTION 

The following attempt to define precisely the concept of independence may seem frivolous, for 
there are probably as many different concepts of “independence” in science as there are concepts of 
“freedom” in the humanities. To justify our efforts, we try to demonstrate that this definition can 
be applied in various fields of mathematics. 

The main idea of this work is to formalize, justify and apply the following physical postulate: 
If a is a sequence generated by a process of the physical world, and (@ is a sequence determined by a 
property formulated with no reference to events of the physical world, then a and £ are independent. 
This agrees with Church’s Thesis, which asserts the recursiveness of any sequence which is both 
mathematically defined and physically realizable, since a recursive sequence is by our definition 
independent even of itself. Our Independence Postulate expresses “autonomy of the physical world”, 
its independence of anything outside itself. This corresponds to the idea of causality in physics. 

The sequence of papers in a mathematical journal, or the sequence of oil prices, are examples 
of a. The sequence of all true assertions in number theory is an example of G. Below we take as 
examples of @ sequences defined by mathematical properties. In Chapters two through four the 
random sequences, the free-choice sequences of intuitionistic theories, and the representatives of 
“regular” Turing degrees respectively are considered as a. In each of these three cases we show that 
accepting the Independence Postulate allows us to radically simplify the corresponding theories. 

The theory is developed in the simplest version which is sufficient for the applications considered 
below. Proofs are presented rather formally and may be omitted at the first reading. 


0.2 AN EXAMPLE 

Recursive function theory allows us to construct analogues of many concepts of classical analysis 
by presuming recursive enumerability of the sets considered. The analogy obtained is quite good 
because intrinsically non-algorithmic methods are the exception rather than the rule in classical 
mathematics. Moreover, the general theory of algorithms is very similar to descriptive set theory. 
(This explains why the main attention of “constructive analysis” has been directed toward the 
search for exotic counter-examples to some theorems of classical analysis. These investigations have 
always remained of narrow special interest.) However, there is an important distinction between the 
constructive and classical theories: the existence of a universal algorithm. The set of r.e. sets is r.e. 
while the set of countable sets is uncountable. As was discovered in (S64, K65], this rather abstract 
difference implies “more concrete” differences and opens new analytical possibilities which have no 
analogies in “non-algorithmic” analysis. Let us explain this with a simple but important example. 

The space 1, of all absolutely convergent number series p, (pEl), iff peRN &>~|p(z)| < 00) is 
well studied in analysis. Its recursive analogue /,C-l; consists of r.e. elements of 1, (i.e. of series 
p whose subgraph {(r, 2): p(z)>>rEQ} is r.e.). It is known in calculus that /;, has no maximal to 
within a constant factor element: VpEl,3gEl; lim(g(z)/p(z)) == 00. In contrast to this, 1; contains 
an “absorbing” element m such that VqgEl, sup(q(z)/m(zr)) < oo. This will follow from Theorem lL. 
This fact is closely connected to the discovery by R.J. Solomonoff and A.N. Kolmogorov of optimal 
coding of finite objects which originated a new approach to information theory, the foundations of 
probability theory, inductive inference and a number of other fields. 

All of these results, which we combine under the name “algorithmic information theory”, are 
based on purely analytical features which distinguish the recursive analogues of some spaces of 
analysis from their classical prototypes. The preceding example illustrates this distinction. 


0.3 NOTATION AND ASSUMPTIONS 


There are several natural general contexts for the formulation of this work. They differ in the 
space () of objects considered to be carriers of information. In all cases we need a countable family 
of functions on 2, which extract parts of this information. Declaring these functions continuous and 
identifying any two objects on which the values of all these functions coincide, we may regard { as 
a topological space with a countable basis and with the Kolmogorov property: for any two points an 
open set exist containing only one of them (assuming this property for the functions’ range). Among 
such spaces there is a universal one, i.e., a space containing the homeomorphic image of any other. 
Formulating the theory for this space is the most general possibility. If the range of the family of 
functions is a metric space, the Kolmogorov property of {2 is strengthened to complete regularity. 
Among completely regular spaces with a countable base, there is also a universal one, R*. As usual, 
considerations look much simpler for a regular space. We even introduce a further, unessential 
simplification, namely, that © is totally disconnected (i.e., any two points can be distinguished by 
a continuous mapping into a discrete space). Cantor’s perfect set NN (or {0, 1) which we do in 
fact consider, is universal among such spaces. Moreover, what we discuss in most detail is an even 
more special case: the space N of non-negative integers. 


We compactify N to N by adding the symbol “oo”. The number m+((m-+-n)(m-+-n-++1)/2) is 
called the pair (m,n) of the numbers m, nCN. This enumeration of the pairs is bijective on N2CN?. 
The projections 7 and 7 are the functions on N such that n = (m(n), 72(n)). Henceforth, 2 denote 
Cantor’s perfect set, represented in the form of WW This form is more convenient than {0, NX since 
the pairs (a,3) and the projections 7; and 72 are simpler defined on it: (a, 3)() = (a(t), B(t)), where 

a(t), 8(i)EN are the i-th terms of a and #. 


Let S;, = {0,1,...k, oo}* and S = US,. A is the empty sequence, So = {A}; xis the number of 
r€S in a natural effective enumeration of S. [(z) is the length of z€S, i.e. the number k such that 
reES,,. If aEN or a€S, and k<n, then a,€S, is the initial segment of a of length k in which all the 
terms larger than k are replaced by oo. zCy means [(r)<I(y) and z = yy); likewise for zCa. Tz 
is the set of a@€M, such that zCa. The sets I’, form a countable basis consisting of the clopen (i.e. 
closed and open) subsets of 2. It is easy to see that if [,(\,@ then l,CTI, or PCr, (i.e. yC2 or 
zCy). B is the set of finite binary sequences; Q,Q+,R,Rt,Q+,Rt,R, OQ are the sets of rational, 
nonnegative rational, real numbers and so on respectively. |x| is the integer part of x. 


While considering topological spaces with natural countable bases we call an open set recursively 
enumerable (r.e.) if it equals the union of an r.e. family of basis sets. The function F with values 
in R we call r.e. if its subgraph, i.e., the set {(z,r): r<F(z)} is r.e.. We call F recursive if F and 
—F are r.e.. We shall systematically assert the recursive enumerability of sets without giving the 
formal tedious constructions. Similarly, in Chapter 3, we assert the expressibility or provability of 
predicates of formal arithmetic without writing out the corresponding lengthy formulas or proofs. 
These assertions can be checked routinely. 


The symbols ~, > and ~ denote inequality and equality to within an additive constant; -<, 
>. and ~ denote these relations to within a constant factor; <, > and = denote asymptotic 
relations, (i.e. f <g = Ve>03a(f>a = g>(1-«)f)). Such expressions as }\f, sup f, min f, etc. denote 
the corresponding operations, taken over the values of all free variables of the term f. 
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1. ALGORITHMIC INFORMATION THEORY 


1.1 UNIVERSAL SEMIMEASURE. 


Let o(z) = {y: Cy, l(y)=l(z)+1}. Then T, = Uy, where y€o(z). Each finite positive Borel 
measure on 2 is uniquely determined by a function w:S—Rt such that Vz>“u(y) = u(r), where 
y€o(z). We identify this function (giving the measure of the set ',) with the measure itself. Let us 
introduce a somewhat more general concept. 


Definition 1: A semimeasure on 1 is a function u:S—-+Rt such that Vz>~p(y)<u(z) (yEo(z)). 
Unless otherwise stipulated, we assume that semimeasures are normalized, that is, w(A)=1. 


A semimeasure pz corresponds to the measure p’ on OLS, where for cES, p’({z}) = pw(z)— > u(y), 
where y€o(z). Any r.e. measure y is also recursive, because p(z) = 1—}-y(y), where [(y)=I(z), ya. 


Theorem L: 
There exists a largest to within aconstant factor r.e. semimeasure: IMVplcVry(r)<cM(z). 


Proof: We prove, first, that the set of all r.e. semimeasures is r.e. For each finite set ACS X QT 
a minimal semimeasure jz, exists the closure of whose subgraph contains A. The norm of this ph, 
(i.e. 424(A)) is evidently computable on a finite A. We denote it by |A|. Let f(t, 2) be a total recursive 
function such that for each i the set of values of f is the i-th r.e. subset of SX Q+ containing (A, 1). 
Let A(t, n) be the set of values of f on the pairs (i,m), where m <n. Let f(z, n)=/(i, n) if |A(¢, n)|<1, 
otherwise f(i,n) = (A, 1). Let A(t) be the set of values of f on the pairs (i,n). Obviously “aci) = 
HA(i,co) iff |A(i, co)|<1. Thus, the family 4.4(:) enumerates all the normalized r.e. semimeasures (and 
only them). The semimeasure }7(1/2”)u4(2) is obviously r.e. and finite, and exceeds any other such 
semimeasure to within a constant factor 1/i?. Q.E.D. 


This semimeasure is called universal and denoted as M. It is the central technical concept of 
this work. Being the largest (to within a constant factor) among all r.e. semimeasures, it determines 
the broadest class of sets ACD2 of positive measure. 


In mathematical statistics one tries, given a, to get a probability distribution 4 for which it 
would be reasonable to assert that “a is random with respect to 4”. This usually means that some 
properties of a (i.e. sets AGM containing a), are of positive probability with respect to u. But the 
latter assertion is the weakest in the case u = M. So, we can take M a priori , before studying what 
the properties of a really are. For this reason we call M the a priort probability distribution. 


Let us express this in other words. Suppose we want to predict the properties of some unknown 
sequence a. The assumption that a occurs randomly with probability distribution y allows us to 
conclude that a will have a property A, when the probability u(7A) of the opposite property equals 0. 
The class of such properties is the narrowest in the case where up = M and therefore these properties 
can be presumed before clearing up what yp really is. Therefore, before determining the nature of a 
random process, one may assume a prior: such properties of an outcome which are certain to hold 
for the random process with distribution M. This justifies calling M the a priort probability. 


M has all the properties necessary for the construction of an inductive inference theory in 
accordance with the ideas of R.J. Solomonoff [S64], but we cannot go into this question here. In 
further accounts we will consider M as the a priwrt probability according to the use of this concept 
in statistics. Let us note that if u is an r.e. semimeasure, then with probability 1 (by ju) a sequence 
a is such that values p(a,,) and M(a,,) agree to within a factor independent of n. This property of 
a can be used as a definition of the concept of “a sequence random with respect to the probability 
distribution z”. We do not attempt to explore fully the properties of M as the a prior: probability; 
our main application of M is to algorithmic information theory. 
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1.2 DISCRETE CASE: COMPLEXITY AND INFORMATION 


Before introducing our concepts for the Cantor set © let us consider the simpler space N. Let 
m be obtained by applying M to N, namely, define m(k) = M{a:a(0)=k}. The definitions of M and 
m imply trivially that m€l, and is, in fact, the largest element in /, to within a constant factor. 


The complexity of nEN is K(n) = —| logam(n)|. This function, as it turns out, defines the 
length of the shortest code for n, using an optimal self-delimiting coding: 


An algorithm A:B-N is called self-delimiting if A(p,)==A(p2) for any pi, p2 such that poCp 
and A(p;) and A(p2) are defined. This means that if A has produced a result on some input, it 
cannot produce anything different on any continuation of this input. Informally, the algorithm A 
recognizes the end of the “essential part of the input” and pays no attention to further symbols. 
Such an algorithm needs no special symbol, to distinguish the end of input, and its input alphabet 
is consequently “authentically binary”. 


Proposition 2 (about coding): There ezists a self- delimiting algorithm A capable of generating 
any nEN from an input of length K(n); more precisely JAVnip (A(p)=n and I(p)=K(n)+1). 

No self- delimiting algorithm A” can be better by more than an additive constant ie. 
VA‘deVnV p(A‘(p)=n) = I(p)>K(n)-«. 


Therefore, the value K(n) defines, to an additive constant, the minimal amount of information 
necessary to determine n. This corresponds to Shannon’: idea that the amount of information in an 
event equals the negative logarithm of its probability. A(n) is the full amount of information in n. 


Proof of Proposition 2: The proof is almost obvious. We need to show for an arbitrary function 
K’:NSN that 2— El, (see 0.3), iff a constant C and a self-delimiting algorithm A exist such that 
K’(z)+C = min {l(q): A(q)=z}. Being self-delimiting, A cannot take different values on segments 
of the same sequence, and can be considered as a function from Q2. The natural measure By on No 
is Bo(',) = 2). If K’(z)+-C = minl(q):A(q)=z then, obviously, 2-4 @)+O)< Bo{a:A(a)=z}. 
Then yo2-K6 + O<1 and 2—*€l,. Also 2~*" is r.e. because the graph of A is. 

Vice versa, if 2~* El, then the set A = {(z,n):n>K(z)} is r.e. Let u(z,n)=2—", if n>K(z), 
and p(z, n)=0 otherwise. Then C = 2-++| log 23 -p(z, n) |< 2+ log 25-2—*" © < oo. It is easy to find 
a recursive bijection (1,n)—+qz,,, of A to a self-delimiting set A°CB such that Bo(gz2,n) = 27+), 
and thus [(qz,,) = n-+C. The desired algorithm A maps qz,n to x, Q.E.D. 


The code of a pair (n,m) of numbers can be shorter than K(n)-+K(m) because n and m may 
contain mutual information coded only once. 


Definition 2: The value I(n:m) = K(n) +-K(m) — K(n,m) is called the amount of mutual 
information in n and m. 


Remark: The self-delimitedness of the coding algorithms A implies that J(n:m)>0, since the 
pair (n, m) can be encoded by p;p2, where p; and p» are the shortest codes for n and m respectively. 


1.3 DISCRETE CASE: RANDOMNESS AND INDEPENDENCE 


In order to arrive at the expression I(x:y) from another point of view, let us consider some 
problems connected with the concept of randomness. In 1.1 we mentioned the possibility of charac- 
terizing the properties of sequences occurring randomly with a probability distribution yu by the 
boundedness of the ratio of M to uv on their segments. Now we touch upon this matter in the simplest 
case: EN. While considering a probability distribution on a countable set, we usually cannot talk 
about “properties random objects must have” since usually only the whole space is of probability 
1. Then we have to consider quantities which must be small on random objects. (This means that 
a given “test” takes large values only with a small probability). 


Let » be a recursive measure on N, u:N-Rt, >) ou(z)—1. Let us call a randomness test 
with respect to yu or a p-test any r.e. function 6:N-N, which satisfies the Martin-Lof condition: 
Vn log op:{z: 6(xz)>>n} < — n. Let m be the universal measure on N, defined in 1.2. 


Proposition 3: 

a) For any recursive measure yu the function d(z/p) = | log o(m(z)/p(z))| is a p-test. 
b) For any recursive measure p and p-test 6, 6(t) <d(z/p). 

c) m is mazimal to within a constant factor among all functions for which a) holds. 


Proposition 3 indicates that d(x/,) is, in a sense, a universal characteristic of “non-randomness” 
and we call it the randomness deficiency of x with respect to y. Motivations of the concept of 
randomness are discussed in Chapter 2. 


Proof: a) Obviously d is r.e. Let u{z: log o(m(z)/u(z))>n} > 2—". Then p{a:m(x)>2"u(z)} > 
2—". Then m{a:m(z)>2"yu(z)} > 2°u{z:m(z)>2"u(z)} > 1, which contradicts the normality of m. 

b) Let y/(z}=(2)25/6%(z). Then, Dy '(2)<Do,—gyylt)2"/w=T(2"/n?)ul2:6(a)—n} < 
>2(2"/n7)2—" = Y71/n? < oo. Thus y(z) is the r.e. semimeasure. Then y’(t)<m(z), which implies 
the required inequality. 

c) Let i satisfy a) as well as m. Obviously m is r.e.. It remained to show that mM is a semi- 
measure, i.e. } m(z) < oo. Let }>m(z)=oo. Then, obviously a recursive function m’(z)<m(z) 
exists such that }/m’(z) = oo. Let s(x) = [log 2}7<,m’(y)|. Let u(z) = m’(x)2—*)/s?(z). Then 
(x) is a measure since }oyu(z)<)_1/n? < oo. Obviously u{z:in(z)/u(z)>2"} >y{z:m‘(z)/p(z)>2"} 
>p{z:s(z)>n} >1/n*, which contradicts the Martin-Lof condition. Q.E.D. 


Let two random variables, defined on the same probability space, be independent and have 
the same distribution 4. This is equivalent to the fact that their joint distribution is w&p where 
Lp(a, b) = p(a)u(b). Suppose the properties of the pair (z, y)EN* correspond to the results of a 
random process with distribution :=m®m, i.e. d((z, y)/z) is small. What is the intuitive meaning 
of this? The same as of the assertion that “(x,y) has the properties of the results of the pair of two 
independent random processes, and each of x and y has the properties of the results of a random 
process with distribution m”. The second part of this assertion is vacuously true, since all numbers 
have the properties of the results of a random process with the a priori distribution m: d(z/m)=0. 


Therefore, the smallness of d((x,y)/i) means only that (x,y) has the properties of the pair of 
objects generated in an arbitrary way but independently of each other. It is natural to consider the 
value d((x,y)/i) as the deficiency of independence. Obviously d((z, y)/z) = I(z:y) ! This is consistent 
with the theorem of classical probabilistic information theory stating that two random variables are 
independent if and only if the mutual information between them equals 0. The difference is that the 
concepts given above are applicable to the individual values themselves, and not only to probability 
distributions (i.e. random variables) on the set of values. 
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1.4 CONSERVATION OF INDEPENDENCE 


The information I(x:y) has a remarkable property. It increases in no random or algorithmic 
(deterministic) processing of x or y and hence in none of their combinations. On the one hand this 
is natural, since if x contains no information about y then hope is little to find out something about 
y by subjecting x to various kinds of processing. (Torturing an uninformed witness cannot give 
information about the crime!) This may conflict with the common experience that the Monte-Carlo 
method solves many problems which are intractable without using a random number generator. The 
clue here is that one can always solve these problems by computing the probability distribution of 
the results of random input processing. For this one needs to consider all possible inputs (instead 
of a single random one) which is an unrealistic volume of work. Even so, theoretically, the Monte- 
Carlo method produces no “absolutely new” possibilities in this respect. 


Theorem 4 (Independence Conservation): 

Let A:N-N be a recursive function, and yp be an r.e. measure on N. Then 
1) I(x:y)>I(A(z):y), 

2) exp(/(z:y))>-E,2) exp(I((z,z):y)), where E means mathematical expectation. 


Proof: I(z:y)~I((x, A(z)):y) since x and (x,A(x)) are computable from each other. It remains to 
prove that /((z, z):y)>J(x:y) for z=A(z). This reduces to K(z, y, z)<K(z, y)+K(z, z)-K(z). 
We need an elegant lemma of Peter Gacs: 


Lemma 1: K(t, K(t))~K(t). 


Indeed, let p be the shortest code for t. Obviously, K(t)=I(p) is computable from p as well as 
t is. Therefore, the complexity of (t,K(t)) equals I(p)=K(t). 


Definition: Anr.e. function m(/):N x N-R?, largest to within a constant factor among such ones 
that sup ,(}>m(z/y)) < 00 is called the universal conditional measure. K(z/y) = —| log 2m(z/y)|. 


Lemma 2: K(z, y)~K(y/(z, K(z)))+K(z). 


Let m..(y/z,n) = m(z, y)2". A nondecreasing by k, recursive sequence m,(y/z,n): A, 7 QT 
exists, such that m..== sup m,, where A, are finite subsets of N°. Let m(y/z,n) = sup, {m,(y/z, n):- 
Yimy(z/z,n)<1}. Obviously Vz, n> -m(y/z, n)<1 (thus m(/)<m(/)) and Vz, n if SOm(z, y)<2—", 
then m(y/z,n) = mi(y/z,n). Therefore Vz,n if }°m(z, y)<2—”, (i.e. if m(z)<2—", or n>K(z)) 
then m(y/z,n)>-m(y/z, n) = m(y/z, n) = 2"m(z, y). Thus K(y/z, K(z))<K(z, y)-K (2). 

It remains to prove that K(y/(z, K(z)))>K(z, y)-K(z)~K(z, y)-K(z, K(z)). This follows from 
the facts that K(z, y)<K(y, 2, K(z)), K(z)~K(z, K(z)) and K(y, t)-<K(t)+K(y/t). The latter in- 
equality holds since m’(y, t) = m(t)m(y/t) is obviously anr.e. semimeasure and then m’‘(y, t)-<m(y, t). 
Analogously can be obtained K(z, y,z) < K(z,K(X)) + K(y/(z, K(z))) + K(z/(z, K(z))). 

Now, item | follows from the note that 


K(z, y) + K(z,z) — K(z) ~ K(y/(2, K(z))) + K(z, K(z)) + K(2/(z, K(2))) + K(z, K(2)) — K(z, (2) 


For the proof of 2) one needs to show that: m(z, y)/(m(z)m(y))>-E,,.)m(z, y, 2)/(m(y)m(z, z)) 
which can be reduced to E,,,.)m(z, y, z)/m(z, z)-<m(z, y)/m(z) since m(z)>-y(z). Let us transform 
it: o.m(z)m(z, y, z)/m(z, z)<m(z, y)/m(z); 5°.m(z)m(z)m(z, y,z)/m(z,z)<m(z, y). The latter in- 
equality follows from the obvious ones: m{z)m(z)~<m(z, z) and }>m(z, y,z)<m(z, y). Q.E.D. 
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Returning from the logarithmic scale to the linear one strengthened item 2) of Theorem 4. 
This scale is natural to use with such linear operations as mathematical expectation. Theorem 4 is 
formulated to within additive constants independent of x,y, but dependent on A or » (bounded by 
K(A) and K(v) respectively). Item 2) is related not to z but only to the mathematical expectation on 
it. Le. the information may increase in a random process, but only with negligible probability (by n 
bits with probability 2~"). Both reservations are unremovable, since one can increase information 
by randomly guessing n symbols of y or by means of an algorithm A already having these symbols 
in its program. This does not diminish the meaning of the Theorem; since one should consider this 
additional information as having been present originally in the program of A (or “in our luck” in the 
case of guessing) rather than arising from the processing of x. Theorem 4 excludes any more efficient 
possibilities. Processes more complex than those in Theorem 4 can be obtained by combining its items 
(e.g. a generalization of 2) by substitution y,(z), dependent on x, for y(z)). Theorem 4 also implies 
non-increasing information in any combination of random and deterministic (recursive) processes. 
This supports the Principle below about the conservation of independence in any physically realizable 
process of information transformation. 


The following formulation and discussion of this Principle is a deviation from the formal account, given for 
motivation of the formal results. To confirm the Independence Principle one may say that it is usually possible 
to “explain” known physical processes. To “explain” means to reduce them to simpler ones in combination with 
recursive and random transformations. General ideas about the development of the physical universe, on the whole, 
also assume that it was originally in a state of random movement of a hot plasma and then was transformed according 
to the (recursive) equations of quantum mechanics (additional randomness appears in the observation processes). It is 
clear that, not being a mathematical assertion (the physical world is not defined mathematically), the Independence 
Principle (like, for example, Church’s thesis) cannot be proved. 

The Principle will be used in further chapters for the case of infinite sequences, for which independence means 
finiteness of the mutual information. It is clear that such an understanding is not suitable for finite objects. What 
we mean here by the independence of Z, yEN is the smallness of I({x:y). Thus it is not an absolute property (as in 
the case of infinite sequences), but rather a quantitative characteristic. (I(x:y) is the “deficiency of independence”, 
see Section 1.3). The prediction that x and y are independent means that for any 7 the degree of our certainty that 
I(x:y)<<n is the same as that the first 7 results of the “honest” toss game will not consist of total zeros. So, 


Independence Principle: 

If x is a sequence generated by a process in the physical world, and y is one determined 
(ineffectively) by a property Ply), formulated with no reference to events of the physical world, 
then x and y are independent to within the formulation length of P, i.e. I(x:y) —I(P) ts small. 


This Principle is not trivial only for those (ineffective) properties P which determine sequences with complexity 
essentially bigger then length of P. As an example, x might be the collection of publications of the American 
Mathematical Society and y might be the list of all true arithmetical assertions of a length less than 1,000,000,000. 


1.5 RANDOMNESS AND INFORMATION FOR INFINITE SEQUENCES 


For the perfect extension of our concepts to the space 2 one would need a non-intuitive technique 
of functional analysis. The following notions are not perfect, but clearly connected to the preceding 
sections. The next definition is a version of a definition from {L73]. In the special case of the uniform 
measure it is equivalent to a definition from [Ch75]. 


Definition 3: The value D(a/p) = | log2sup(M((an)’)/u(an))| is called the deficiency of ran- 
domness of aGf with respect to a semimeasure py. 


Definition 4: The value I(a:G) = D((a,B)/M@M) is called the amount of information in a 
about @ or the deficiency of their independence. 


Any r.e. measure is recursive and thus computable with any accuracy, but r.e. semimeasures of 
a segment of a sequence can in general be effectively approached only from below. In particular, any 
r.e. set of recursive upper bounds to M is bounded from below. But it may be known about some a€Q. 
that on its segments the r.e. semimeasure M agrees with some r.e measure p to within a constant 
factor. Then, computing y we can find M(a,) to within a factor (or K(a,) = —| log 2M(a,,)| to within 
an additive constant). Such a we call complete, denoted a€C or C(a)= dusup(M(an)/u(an)) < 00. 
This means that a contains all information necessary for computation of complexity of its segments. 
By Proposition 5, C is very extensive. By virtue of its item 2), any sequence a satisfying the 
Independence Principle (as x) has a completion (a, 8)EC, satisfying this Principle as well. 


Proposition 5: 

1) The set Cts closed under the application of any total recursive operators (A(C)CC) and 
the complement of C is of measure O in any recursive measure. 

2) Let y be a sequence to which a universal r.e. set is Turing reducible and a be independ ent 


of 7. Then BeENN exists such that (a, 3) is complete and independent of 4. 


Proof: Let 6,n(a) = log 2 sup(M(a,,)/p(a,)). Analogously with Proposition 3, 6, is a Martin-Lof 
p-test (Definition 5). Let a€C. Then 3y:6,,(a) < oo. Let u’(z) = p{a:A(a)Dz}. Then yp’ isalsoanr.e. 
measure, and 6,,-(A(a)) is a Martin-Lof yz-test. Then by virtue of Proposition 6, 6m(A(a)) <D(a/p). 
Obviously D(a/4)~<6 (a). By our assumption 6,,(a) << oo. Therefore 6,,(A(a)) < oo and A(a)EC. 
Obviously u(C)=1 because 6,,(a) is a Martin-Lof test. 

It remained to prove 2). In section 3.2 of [L70] it is shown that M (like any other r.e. semimeasure) 
can be obtained by means of a partial recursive operator A from a recursive measure py: M(x) = 
u{a:A(a)Dz}. Let A’(a) = (A(a), t4(a)), where t.4(a) is the sequence of values of the time of A(a) 
terms calculation. The operator A’ is total and, hence, «“(z) = p{a:A‘(a)z} is a recursive measure. 
M is generated from y’ by the projector (a, t)—a. By Proposition 7, y’{(a, t): I((@, t):y)=0o} = 0. 
Also p’(Q — C) = 0. Therefore, M{a:VtEQ(((a,t) G C)VI((a, t):y)=0o)} = 0. By Lemma 3, for 
any set A such that M(A) = 0, a sequence @ exists on which all elements of A depend. The same 
is fit for any sequence to which @ is reducible. Using reducibility to y of the universal r.e. set, one 
can routinely check that the necessary 6 is computable with respect to y. Thus y depend on all 
sequences a not completable to a complete, independent of -y sequence (a, t). Q.E.D. 


The a comes from (a, 8) by a partial recursive (but not total) projection operator. As V’jugin 
[L.77] has shown, partial operators can lead out of C if the time of their work is bounded by no total 
recursive operator. In Chapter 3 we postulate, along with a version of the Principle of independence 
conservation, an axiom that means intuitively that every sequence in the physical world comes from 
a complete one as a result of the application of a partial recursive operator. 
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1.6 TIME OF COMPUTATION 


In this work the computational resources necessary for enumeration of various r.e. sets are, as 
a rule, ignored. Now we touch this question briefly. Let ¢,(,) mean time of the A(p) computation. If 
K(s)<n, then Jq:l(q)<n, A(q)=s, where A is the optimal algorithm from Proposition 2. This q can 
be found by searching through all words shorter then n. This requires very large (exponential) time, 
even if £4(q) is linear. Now we give the optimal (by time) algorithm for searching for q. We assume 
that algorithms are realized by storage modification machines of Kolmogorov-Uspenskii. 


Let Kt4(x/y) = min {(I(p)+ log taip,y)): A(p, y=}, where p is a binary sequence without ter- 
mination mark: the algorithm A can receive, by request, the symbols of p in order until p is ended; 
in case of further requests A gets no reply and gives no output. K¢,(z) == Kt,(z/A). Analogously 
with Theorem 1, an algorithm A exists such that Kt, is minimal up to an additive constant, and we 
will denote Kt. by Kt. There exists an algorithm G(n,y) generating the list {r: Kt(¢/y)=n} in time 
2”; and up to a constant factor, /C¢t is a minimal function with this property. (The asymptotically 
minimal one is kl(x) = min {/t(a): c€aCN}, |a|<oo.) 


Let R(s,q) be a predicate, recognizable in time tpo,q) < P(I(q)), where P is a polynomial. The 
problems of finding q (if it exists) satisfying R(s,q) are called search problems, and the problems of 
discerning Jq(f(s, g)&l(q)<P(I(s))) are called the NP-problems. Without loss of generality one can 
consider P linear by adding zeros to s and q. Searching through all q in the order of increasing Hl(q/s) 
{instead of i(q)) gives the fastest algorithm (up to a constant factor) for solving any search problem 
(see [L73a]; related ideas also have been expressed by L. Adleman). In particular, it is optimal for 
finding q such that A’(q)=s,l(q)<n, ty(q)=-l(s). For the case of large t4(,) a similar algorithm works: 
namely in the definition of Kt, replace the expression t4(p 4) by ta-(acp,y)) = tacy,y) + Lae): 


Functions of the type Kt are of a particular interest for the case of algorithms with random 
number generators. For f:N-R+ let C(f) = —logy f dw/(taj.y+f(A(w)), where w is the random 
variable and, like above, A is the optimal algorithm minimizing C. For FCN, C(F) means C(f), 
where f(z)=:0, if cEF, else f(z)=o0. The above algorithm G(n,w), generating numbers x randomly, 
hits any FCN in time <2” with the probability p > 1-2—*, where log a >n-C(F). Obviously, for 
any such algorithm: p < 2"—~©(), Thus C(F) determines the time which is necessary, and essentially 
guaranteed, for “hitting” F. A function f, with range other then just {oo, 0}, can be interpreted as a 
“price” (for instance, the time) necessary for establishing r€F=f~!(R+). Everything is analogous 
for C(f/y) oS logy f dw/(taju,yy + f(A(w, y), y)). 


A number of search (NP) problems is known which have no proof of quick solvability by deter- 
ministic algorithms, but are very quickly solvable by probabilistic ones. F.g. integer compositness 
and constructing “non-compressible” words q, i.e. ones equivalent to no essentially (say twice) shorter 
word p (q and p are equivalent if they are transformable into each other by a simple and quick 
algorithm). The complexity of the search problem R(s,q) for probabilistic algorithms is characterized 
by C(f,/s), where f,(q) = trs,q), if R holds, and f,(q)=00 otherwise. The relationship of this com- 
plexity with I(s) isa “randomized” version of the P=NP problem. But the problem of its relationship 
with the “complexity of obtaining s” looks much more interesting. More accurately: how does the 
function of n, C({s: oo>C(f,/s)>n}) grow, polynomially or exponentially? A short s may exist for 
which it is very difficult to find q such that R(s,q), but to find such an s may be even more difficult. 


Other results about the computation time may be obtained by diagonal methods. E.g. the 
results of [L73b] for Turing machine space remain valid for many other types of complexity: time 
of storage modification machines, exponent of Turing machines space etc. 


10 


2. APPLICATION TO THE FOUNDATIONS OF PROBABILITY THEORY 


2.1 FOUNDATIONAL DIFFICULTIES (A historical digression) 


This section is not formally related to the work. Its purpose is to clarify the context in which the problems to 
be studied in Chapter 2 arise (in particular the problem of the introduction of the concept of “randomness” ). 


Hilbert’s sixth problem suggests “To treat in the same manner (as geometry), by means of axioms, 
those physical sciences, in which mathematics plays an important part; in the first rank are the 
theory of probabilities and mechanics.” (sce [H02]) 


It was generally considered that this problem is completely solved in A.N. Kolmogorov’s 1933 book [K33]. 
However, this is only partially so. Kolmogorov's work opened great possibilities for the development of techniques of 
probabilistic methods and their applications. At the same time certain foundational difficulties were left unresolved. 
Kolmogorov noted this in the foreword to the second Russian edition of the book, where he refers to works by 
Kolmogorov, Zvonkin and Levin [K65, L70} for his new approach. The well-known previous attempts to overcome 
these difficulties by J. von Mises [vM64] and A. Church [C40] turned out to be imperfect. 


The difficulties lie in the gap between intuitive probabilistic ideas and those methods which are justifiable 
theoretically. The premise for the use of probabilistic methods is the assumption that the result x of a physical 
process arises randomly with probability distribution £. This 2 is discovered or hypothesized e.g. by analogy with 
other processes and statistical data about them, considerations of symmetry, etc. Then, according to the naive 
ideas, those properties of x are indicated as probabilistic laws whose jé-probability is 1 (approximately, in the finite 
case). E.g. when Z = 21, ..., 2p, where 1, ..., Ln are independent and identically distributed (i.e. yz), votin) = 
be'(z1)e"(x2)...u'(tn)) the law of large numbers plays an important role. For each property B it asserts that with 
}-probability close to 1, the frequency of B(z;) realization is close to the probability p(B). In any case, subjecting 
x to such laws is predicted, i.e. having properties whose probabilii > is 1 (approxirnately, in the finite case). 


The problem is that jointly the properties of probability 1 have probability 0 ! We cannot predict 
the realization of all of them simultaneously, but we should choose one or a few of them for prediction. Thus if the 
result had arisen before we managed to make a prediction, we could not expect to subject this result to any statistical 
tests. For example, classical theory provides no rigorous basis to doubt the honesty of the lottery director after his 
son wins the first prize in ten consecutive years, if we discover this “post faclum”! We cannot subject an election 
to criticism when the share of votes for the ruling party in a series of consecutive years formed a sequence 0.99K;,, 
even if A; turns out to be the digits of the decimal expansion of the number 7 ! Of course, one can select a few 
“standard laws” and presume their predictions if before the beginning of the experiment this selection was not changed. 
However, standard probability theory contains no principles which would allow the distinction of 
such standard laws from others. Besides, it would not solve the problem of applying probability theory to events 
which had occurred before such a standardization (for example, to cosmology, history, geology, etc.). 


The idea of solving this paradox consists in considering as “standard” those properties of probability close to 1 (in 
the finite case), which are “simply expressible”. The objects not satisfying such a property form a simply expressible set 
of small measure and correspondingly small cardinality. Thus any such set clement is simple itself, being determinable 
by its number (smaller then set cardinality) with the simple set description. This allows us, instead of indicating 
many simple “standard” properties, to consider a single one: “not to be a simple object”. Solmogorov’s algorithmic 
information theory was a surprising discovery, provided a rigorous basis for the obscure notion of simplicity. In the 
infinite case the corresponding property is “to be random with respect. to distribution 42”. Then only this property 
is postulated to follow from the assumption about the random occurrence of an object in a process with distribution 
4. This property is of #-measure | and implies all other “good” properties of £-measure 1. Attempts to introduce 
such a concept were also undertaken by von Mises and continued by Church for distributions p2 of the Bernoulli type 
[vM64,C40]. Ilowever, it was found [V39] that even such standard properties as the law of the iterated logarithm do 


not follow from their notion of randomness (i.e. from the property of being a collective). 
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2.2 THE LAWS OF RANDOMNESS AND INDEPENDENCE 


We consider below two types of properties of sequences aG{Q. These properties are the laws 
of probability theory in the usual classical sense (i.e. the probability of their violation equals 0). 
For simplicity we restrict ourselves to the case of recursive probability distributions, though these 
results can be generalized to non-recursive cases as well. 


The Law of Randomness: Let us say that aCQ satisfies this law (is random) with respect to 
a semimeasure yp, tf D(a/p) < 00 (D ts given in Definition 9). 


This means that the values of 4 on segments of a are not much smaller than ones of the a priort 
probability M. (i.e. the hypothesis that a has occurred randomly with probability distribution yp is 
at least as consistent with reality as the a priort idea about occurring it with the distribution M). 
As we will see below, this property implies fulfillment of all effective probabilistic laws. 


Definition 5: An r.e. function 6:0-+N ts called a Martin-Lof test with respect to a recursive 
measure yt (or a p-test) iff Vn log o{a: 6(a)>n}< — n. 


It is said that a sequence withstands the test if 6(a) < oo. Definition 5 is a formalization of the 
concept of a “good” law of probability theory. The value 6 means the degree of deviation from such 
a law. Complete deviation occurs at 6(a)=oo, the probability of which is 0. The deviations can be 
effectively discovered since 6 is r.e. The logarithmic scale of deviations is chosen for convenience 
(the definition serves equally well with any other recursive scale). 


Proposition 6: Let yu be a recursive measure, then 
1) D(a/p) ts a Martin- Lof test with respect to yp. 
2) For any Martin- Lof test 6 we have (a) <D(a/p). 


(The proof is analogous to the proof of Proposition 3.) We see that if a withstands test D with 
respect to measure pz then it withstands all conceivable p:-tests. These tests correspond to the “good” 
laws of probability theory. What is the situation with the bad ones? This is interesting because it 
clarifies the relation between the algorithmic and classical approaches to probability theory. Let us 
give an important example of non-recursive laws. 


The Law of Independence: Let yEQ. We say that aC satisfies this law if a and + are 
independ ent, t.e. I{a:y) << 00. Then I(a:7) ts “the degree of deviation” from this law. 


Proposition 7: For any yEO and r.e. semimeasure yp, the value I(a:7y) satisfies the relation 
log 24{a:I(a:y)>n} < — n (Compare this inequality with one in Definition 5). 


Proof: It is sufficient to show that log 2M {a:I(a: B)>n} <—n. Let DN(a/p) = log 2sup,c,- 
inf, 5,(M(z’)/u(2’)), and IN(a:8) = DN((a, 8)/M®M). Let us prove first for any r.e. semimeasure 
p that DN(a/pu) >D(a/p). Let ,. be a recursive non-decreasing by t sequence of semimeasures 
such that [,(\l,=0o = wy, (z’)=0; I(t = wy, 2(2’)=0; TT, 4oo = sup y;,,,(z’)=p(z’). Let 
t(z, n) = sup {t’:yy,2(A)<2-"m(2*)}; wn e=He(2,n),2 and p=) >(2"/n?)u,,2. Obviously, yu’ is an r.e. 
semimeasure and, hence, »’~<M. Besides, Vz, x’, n((xC2"; m(z)/p(z)>2") = (u’(z’)/p(z’)>2"/n?)=> 
(M(z’)/u(z’)>-2"/n?)). Then, by the definitions of D and DN, D(a/p)>n = DN(a/p)>n-2 log on. 

It remained to show that log »M{a:IN(a:8)>n} ~ —n. Let Aug = {a:IN(a:8)>n} and 
M(An,3) > 27"+°. Being an open set, An,g has a clopen subset A’ such that M(A’) > 2—"+°. Then 
k and TCS, exist, such that AY = U[,:2ET and thus, VzET: log 2(M(z, Bx)/M(z)M(Gx)) > n; 
Qt < S°M(z):cET. Then 5°>M(z, Bx) > 2"M(6,)5>M(z) > 2"M(G,)2—"*°, and therefore 
>=M(z, Bx) > 2°M(G;), which is impossible for c large enough. Q.E.D. 
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2.3 COVERING OF THE CLASSICAL FORMULATION OF PROBABILITY THEORY 


The law of independence (as well as of randomness) is violated only with probability 0, and thus 
it is a law of probability theory in the customary “classical” sense. Its meaning is that a randomly 
generated sequence must be independent of a sequence given beforehand. This law depends only on 
the parameter y, and nothing is suggested relative to -y except that it is present in advance (e.g. is 
uniquely defined by some property in the language of our law formulation, e.g. in formal arithmetic). 
Note that in the formulation of this law (I(a:y) < oo) the probability y is not mentioned at all. 
This property of a can be prescribed before specifying the probability distribution of the process 
generating a (i.e. this property holds independently of the parameters of probability theory). 


In section 1.4 we have seen that this law is more general than the usual laws of probability 
theory. The Independence Principle in 1.4 asserts that this law is realized not only in the usual 
random processes, but in any process of the physical world as well. This encourages one to bring this 
law outside the limits of probability theory and to consider other probabilistic laws only for those 
sequences which satisfy the law of independence. It turns out that this makes other laws unnecessary! 


_ Theorem 8: Let uw be an r.e. measure. For any set A such that p(A)=0, such a ¥ exists 
that AC{a: D(a/p)=00} fa: I(a:y)==00}, te. any probabilistic law (in the classical sense) is 
reduced to the two laws considered above. 


If we confine ourselves to sequences satisfying the law of independence (this includes, in accord- 
ance with the Independence Principle, any sequence in the physical world) then any law (recursive 
or not) of classical probability theory is reducible to the “law of randomness”. 


Proof of Theoiem 8: It is easy to see that 6,,(a) = log 2 sup(M(a,)/u(a,)) is a Martin-Lof test. 
Therefore if D(a/u) < oo, then 4cVn M(an) < cu(an). Let A’ = Af){a: D(a/p)<oo}. It is clear 
that M(A‘)=0 follows from y(A’)=0. Thus it is sufficient to prove the next 


Lemma 3: 
For each set A’ such that M{A’)=0 a sequence G exists on which all elements of A’ depend. 


Obviously a sequence A,,_)A’ of open sets exists such that M(A,,)<2~—™. Let A’,CSX Qtr 
be such that ((z, 11), (y, r2)EA “m, TCY) = (t=y, 11=12); Am = {a:5(z, r)EA’m, Ca}; (2, r)EA’ m= 
M(t) <r < 2M(z). Let @ be a sequence with respect to which A’,, is r.e. Then a recursive set T 
exists such that (z, r)EA’ m= JyCA(z, y, m, r)ET; VmVaEN >-14(s) << 2—™t!” where s€T, mo(s)Ca, 
p3(s)=m (if s=(a),a2,a3,a4) then 2;(s)=a;). Now we shall replace T in such a way as to make 
(x, y,m, r)ET’ = U(z)=l(y) fulfilled. First we replace each quadruple (z, y,m,r), where I(z)>I{y) 
by the set of all quadruples (2, y’,m,r), where [(y’)=I(z), y“Dy. Then we replace each quadruple 
(x, y,m, 7) where [(y)>>l(z), by the set of the quadruples (z’, y,m,r’), where [(x’)=/(y), z’Dz, and 
r < M’y,,(2’). M’y,2,,(2’) is given in the following way. We generate evaluations from below 
of numbers M(z’):(z’z, I(z’)=I(y)) until their sum exceeds r. If this happens we stop the process 
on the previous step and the result will be M’, ,,-(z’). Otherwise M’, 2 (2°) = M(z’). Let T” be 
the set of triples (z, y, m) such that Jr(z,y,m,r)ET’. If s€T’’, then r(s) = sup {r°:(s, r‘)ET’}. The 
obtained T”’ and r satisfy the following conditions: 

1) (z,y, mJEX” = I(z)=l(y) 3) yCB = 1(z, y, m)>M(z) 

2) An = {a: (2, y,m)ET”, yCB,zCa} 4) Wmv 27" FD ir(z, y, m), yCB 
It follows from 4) that 5>r(z, y) < 00, where r(z, y) = S0(2™/m?)r(z, y, m)M(y). 
Therefore, r(z, y)<M((z, y)*). Obviously VaGA,,,32, y: U(z)=l(y), zCa, yCB, r(z, y, m)>M(z). 
Hence VaGA,,/(a:8)>2""/m?. Then VaGA I(a:8)=00. Q.E.D. 
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3. APPLICATIONS TO INTUITIONISTIC MATHEMATICS 


3.1 A DIGRESSION 


It is known that second order theories are much more complicated logically than those of the first order. The 
theories permitting free handling of elements of a continuum (in particular, quantification over them) belong to the 


second order. First order theories admit quantification only over constructive objects. 


At the beginning of this century a number of mathematicians (the intuitionists) suggested that these complications 
are artificial and even dangerous in the sense of the possibility of paradoxes. In particular, they asserted that elements 
of a continuum (unlimitedly elongated sequences, unlimitedly small points, etc.) do not make sense as logically defined 
formal objects, but are taken from the physical world. Therefore, the applicability of usual logical operations to them 
is not @ prtort obvious when these operations have no analogies in the physical world. For example, in order “to 
apply” a classical universal quantification, one would need the ability to scan all conceivable sequences; this is not, 
of course, physically implementable. It was suggested that, having restricted our logical means only to such formal 
procedures and postulates that have closer connection with “physical intuition”, we would obtain a mathematics whose 
proof power is more mensurable and less suspicious. The evident difficulty is in the obscurity of our physical intuition. 
This brings up difficulties in the choice of the formal principles which would reflect adequately the nature of sequences 
being generated as a result of events of the physical world. Brouwer’s original idea of sequences generating by “free 
choice” of their terms clarifies the situation not enough, since the concept of “freedom” is itself obscure. A result is 
a great variety of intuitionistic principles and theories that strengthen, weaken or contradict each other. 


As a rule, these theories are too strong, on the one hand but too weak on the other. They are strong to the 
extent that the connection of their principles with physical intuition ceases to be obvious. This is aggravated by the 
fact that often with respect to the possibility of the inconsistency occurrence, these theories turn out. to be equivalent 
to the corresponding classical ones (which kills the hope for the increased “reliability” of intuitionism)., They are weak 
to the extent that they leave unsolved many natural questions about the validity of other principles of intuitionistic 
reasoning. The latter fact generates multiform possibilities of extending these theories, and provides abundant material 
for research. However, this eliminates the possibility of obtaining a theory which gives us some feeling of completeness 


and is suitable for “canonization” as the universal foundation of intuitionistic mathematics. 


In this section we will try partly to overcome these difficulties by using an axiom schema which corresponds to 
the Independence Principle (see 1.4). On the one hand, this Principle seems to have more tangible (physically clear) 
foundations than many arguments about the nature of “free choice”. It turns out that with respect to consistency 
and mensurability of the proof power, the theory obtained is equivalent to the classical first order arithmetic. More 
accurately, the intuitionistic second order arithmetic (analysis) considered below is a con ser vative extension of 
the classical first order arithmetic, formulated without disjunction and existential quantification. On the other hand, 
it is in a sense complete. More accurately, it has no essential extension which would retain the indicated property of 
conservativeness (i.e. an extension gotten by adding an essentially new principle which is “purely logical” i.e. implies 
no new theorems of classical number theory). All these “virtues” of the theory below are connected with the fact that 
the Independence Postulate excludes the existence of sequences containing unbounded information about the truth of 
mathematical statements. It is natural to attribute the usual troubles of second order theories to such fancy “logical” 


sequences which in fact do not exist in the physical world. 
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3.2 THE PRELIMINARY CALCULUS A 


Our theory AI will be constructed in Section 3.3 by adding a group of axioms to the basic calculus 
A, described below. A is formulated in the usual language of second order arithmetic. This language 
is obtained from the one of first-order arithmetic (see [KL67] section 38), by adding a countable list 
of second order variables denoting sequences (functions) of natural numbers and adopting the term 
a(t) and formulas VaF' and daF for any second order variable a, term t, and formula F. A formula 
is called absolute if it is constructed from equalities between terms with the aid of conjunction, 
implication, negation and universal quantification of first order variables. Absolute formulas have 
the identical meaning and equivalent provability in intuitionistic and classical theories. In section 
0.2 a definition of the pair of numbers is given. In the same fashion we give meanings to the notion of 
the pair of terms, expressions (a), ...,@,), a(a, ...,@n), (a, 8), etc. Allowing liberties with the language 
we use the notation n = Prir (n equals the projection on variable t of the term 7) for the fact: 
ds(7(s)=n-+1&(Vs"<s: 1(s’)=0)), (ie. (n+1) is the first non-zero term of sequence 7(¢)). Handling 
the expression Pr; like a term will never cause any misunderstanding, in particular thanks to (3.2.2). 


The postulates of A consist of the postulates of first order arithmetic (see [KL67], p.387, List 
of Postulates, Schema 8 is taken in the intuitionistic version 8’) and three second order postulates: 


Schema of Choice: (Vn(7A=4zB(z))) = JaVn(-A = B(Pr,a(n, t))) (3.2.1) 
Markov Principle: (-Vn a(n)=0) = Jna(n)40 (3.2.2) 
Axiom of Countability: JaV@3kVn B(n)=pria(k, n, t) (3.2.3) 


Axiom (3.2.3) asserts that the set of intuitionistic sequences is countable. Under the interp-eta- 
tion of intuitionistic sequences as sequences of results of real macro-events in the physical world, 
this axiom corresponds to the customary statement on the existence of a countable basis of open 
sets in the space-time. We do not discuss the axioms of A in detail, since they are not original. We 
observe only that for the construction of any complete calculus (one that satisfies Theorem 10), it 
is necessary to adopt either these axioms (at least under double negation) or their negation or their 
equivalence to some undecidable absolute statements of number theory. The last two variants seem 
less natural. It is known that (3.2.1 - 3.2.3) are inconsistent with the principles of continuity and bar- 
induction. In this respect the calculus A more resembles Kleene’s theory of recursive realizability. 
Of course, the calculus A is still too weak. Nonetheless, we have 


Proposition 9: For any formula F an absolute P exists such that A|- FeVai@ P. 


Proof of Proposition 9: The proof is based on the fact that the axioms of A allow us to introduce 
a concept analogous to Kleene’s recursive realizability, by using the universal sequence a from axiom 
(3.2.3). Namely, the concept “a number x realizes a formula F° with respect to a sequence a” is 
defined in the same way as in Kleene’s book (c.f. Introduction to metamathematics, Chapter 2), 
but recursiveness of all the functions used is replaced by recursiveness with respect to a. It is easy 
to prove in A the equivalence of any formula F to the existence of a realization of F with respect to 
a universal a. The latter assertion is equivalent to the fact, that for any @ there exist a sequence a 
and a number x realizing F with respect to (a, ). It is easy (though bulky) to check that A contains 
all the axioms necessary for formalizing these arguments, i.e. the deduction of F=VG3a P, where P 
is the absolute formula expressing that a(0) is the realization of F with respect to (a, 8). Q.E.D. 


15 


3.3 THE CALCULUS AI; ITS RELATIVE CONSISTENCY AND COMPLETENESS 


Let P(n) be an absolute formula with a single free variable n. A finite binary sequence p is called 
compatible with P (denoted pCP) if, Vn<i(p): (p(n)=0 = P(n)). The abbreviation /(a:P) means 
sup {I(a:p): pC P}. For a given P, the statement I(a:P)<c can be easily expressed by an absolute 
formula with free variables a, c. Using that we introduce an axiom schema with a parameter P: 


Independence Postulate: Vaicl(a:P)<c (3.3.1) 


One more informational statement must be valid in AI. The property of completeness (defined 
in (1.5)) is expressible by an absolute formula C(a). Our last axiom asserts feasibility of sequences 
completion mentioned in Proposition 5 within the bounds of the theory: 


VadyC(a, 7) (3.3.2) 


The double negation of this axiom follows from the weaker statement 73aV-y7C(a, y) inasmuch 
as we can use the existence of a “universal” sequence by axiom (3.2.3). Analogously, the double 
negation of (3.3.1) follows from the statement 7daVc I(a:P)>>c. These weaker versions are sufficient 
for our purposes, but we chose the formulations (3.3.1) and (3.3.2) because they are simpler. 


Definition 6. A theory ts called absolute if for every closed formula F an absolute (see 3.2) 
formula P ezists such that ->F' =P is provable in this theory. 


The theory of recursive realizability of S.C. Kleene is an example of a theory known to be 
absolute. This is the theory obtained from A by replacing (3.2.3) with 


Church’s thesis (CT): V@3kVn: B(n)=U(k, n) (3.3.3) 


where U(k,n) is a universal partial recursive function. Condition (3.3.3) is obtainable from (3.2.3) 
by imposing the condition of general recursiveness on a. Our theory AI is, of course, not absolute, 
inasmuch as the formula 7-(C’T) is not deducible in it, nor is it refutable, nor can it be reduced to 
any absolute formula. This formula however, is the only one of this sort; namely, 


Lemma 4. For any closed formula F four absolute formulas P, P2, P3,P4 exist such that 
these statements are deducible in AI: 
M(PIVPAVP3VP4); (Pi=F); (Pa=oF);  ~o(P3=(F(CT)));  7(Pa=(F=-(CT))) 


Then to get an absolute theory an axiom is necessary implying the truth or the falsity of (CT). 
It turns out that this is sufficient as well. The theory AI+(CT) is equivalent to the theory of recursive 
realizability of S.C. Kleene and is consequently absolute. It is of little interest for our purposes since 
by admitting Church’s thesis (3.3.3) we would exclude from consideration all non-recursive sequences 
(for instance the random ones). To the degree that (CT) is a very strong axiom, the axiom >(CT) is, 
inversely, very weak. Thus one might have not expected that the theory AI+--(CT) is also absolute. 
This fact follows from the following Theorem. 


Theorem 10: The class of absolute closed formulas deductible in AI +-(CT) coincides with 
the class of absolute theorems of the classical first order arithmetic. No essential extension (i.e. 
one containing new theorems of the form -F') of the theory Al + -(CT) has this property. 


Thus the theory AJ + -(CT) is a maximal conservative extension of classical arithmetic. This 
property is in a sense consistency and completeness relative to classical arithmetic. The basic goal of 
the construction of this theory was the study of the possibilities given by the axiom schema (3.3.1). 
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Proof of Theorem 10: It is sufficient for each closed formula F to establish a corresponding ab- 
solute formula F such that: (AI+-(CT))-F =F, and if F itself is absolute, then -F' =F is deducible 
in first order arithmetic. Besides, one needs to show that every axiom F of (AJ + -(CT)) will be 
converted into theorem -F of first order arithmetic and the rules of deduction will be converted into 
derivative first order deduction rules. We shall indicate the transformation F into F and explain its 
meaning without writing out all routine formal deductions. Due to Proposition 9 it is sufficient to 
restrict ourselves to formulas of the kind F = Vai@P(a, 3), where P is absolute. We say that F is 
rejected on yEN if for any recursive function r:N—N it is false that for any recursive operator k:0—Q, 
applicable to y,k’=r(k) is also applicable to -y and P(a, G) holds, where a==k(7), and G=k’(7). Let 
p be a recursive continuous measure. It turns out that the equivalence between -F and the formula 
“F is rejected for u-almost all y” is deducible in AJ + ~(CT). 


The latter formula can be written in an absolute form and chosen as F. The point is that the 
quantifier “for almost all y” in contrast to the quantifier “for all y” is expressible in the first order 
language. Obviously the formula “F is rejected on y”, being absolute, can be presented in the form 
of ¥ngwWny,—17...Vno7R(y, no, N1...N,), where R is a recursive predicate, monotonic on each of the 
arguments n,; (up — for the even 7 and down — for the odd ones). Let us show by means of induction on 
i, how the predicate p{y:V¥n;_ ,7Vn;_97...¥ngrR(7, no...n,)} >r is expressed by an absolute formula. 
For i=0 it is trivial. Now let, at the given i, our predicate be expressed in the form of S,(r, nx...ni). 
Then VnVr’°>(1-r) 7S,(r’, ng...n;) can serve as S;44(r, ng...2;41). Thus, it remains to show that -F 
is equivalent in AJ + -(CT) to the assertion “F is rejected for y-almost all y”. 


Lemma 5: Let py and yp’ be r.e. measures and yu be continuous. Then recursive operators P 
and P’ on 2 exist such that: 

1) VACA m’(A) = w(P~!(A)) 

2) Vw, aw(P(P’(w))Aa¥P(P(w))) 

3) P’ (respectively P) is defined on yw’ (resp. y)- almost all non- recursive sequences. 


The proof of this lemma follows from Theorem 3.1 b) in [L70]. Since the property “F is rejected 
on y” ts invariant with respect to any recursive reversible transformation of 4, it is sufficient to prove 
the equivalence of -F to “F is rejected for y-almost all y” just for ~ = By (the uniform measure 
on (2). By virtue of the same invariance and Kolmogorov’s 0-1 law (see {k33]), the set A of all 4, 
on which F is rejected, can be only of measure 0 or 1 with respect to Bo. Hence if R is the set of 
all recursive sequences, the measure of (Af)-R) or of (7=A{)7R) equals 0 with respect to any other 
recursive yz as well. Then by virtue of Theorem 8 a sequence exists (and it can easily be defined 
by an absolute formula), on which all complete + from this set depend. The axioms of AJ + -(CT) 
imply that any universal sequence (from axiom 3.2.3) is non-recursive, equivalent to a complete one, 
and independent from sequences, defined by absolute formulas. Therefore in the case p(A) = 0, F 
is not rejected on a universal -y and 7-F holds. In the opposite case -F holds by analogous reasons. 
These reasonings can be easily transformed to formal proofs in AJ + 7(CT). Each of the two cases 
gives implication in one of the directions between -F and “F is rejected for y-almost all 7”. Q.E.D. 
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4. APPLICATION TO THE THEORY OF TURING DEGREES 


4.1 INDEPENDENCE AND NEGLIGIBLE SETS 


One of the natural fields for application of algorithmic information theory is the theory of 
Turing degrees. It is natural to interpret the recursive reducibility of a to @ as that § contains 
all information about a, more accurately, all information except a finite amount equal to the com- 
plexity of the reducing algorithm. However, the informational concepts are subtler and less awkward 
than reducibility degrees. In particular, the first concepts unlike the latter ones, are invariants 
applicable to finite objects as well (Theorem 4 shows that I(x,y) is invariant to within a constant by 
all recursive reversible transformations of N). Algorithmic information theory gives new interesting 
possibilities. One of them is the introduction of the concept of independence in addition to the 
concept of reducibility. In the language of reducibility degrees it would be possible also to say that 
a and @ are independent if any sequence reducible to both of them is trivial (recursive). But a 
simple example shows that this definition is not adequate to intuition. Let a and ¥ be 0,1-sequences, 
obtained in random processes of independent trials, where the probability of a,—=0 is 1/2, and the 
probability of y,,==0 is 0.99. Let @, = an @n. Then, a and @ are almost always such that no 
nontrivial sequence reduces simultaneously to both of them, though 99 percent of the contents of a 
and # coincide (in view of which it is hard to consider them as independent). 


We shall use the concept of independence from Chapter | for the definition of the concept of 
“negligible sets” of sequences. This give us the possibility of studying properties of Turing degrees 
“to within this negligibility”. Many exotic types of Turing degrees are known. Such are, for example, 
“minimal” degrees containing indivisible information {any part of the information of such a degree 
G, i.e. a degree a < f, is equivalent to 0 or 3). The existence of such degrees is proved by diagonal 
methods and the reality of the respective sequences would be strange. In particular, (see [F-67]) 
the impossibility of the appearance of such sequences in any combination of random and recursive 
processes was proved. One may hope that many complications of the theory of Turing degrees are 
caused by exotic examples of this kind, and the theory of “real degrees” is simpler. We shall see 
below that this is partially so, but only partially. We call a set ACD inaccessible, if its complement 
is closed with respect to the use of every recursive operator F (i.e. Vala € A = F(a) € A)). 

Proposition ll: The following four properties of the set ACY) are equivalent: 

1) A sequence a€Q erists on which all BEA are dependent (i.e. JaVBEA: I(a:B)=co). 

2) A is a subset of some inaccessible set Aj, any r.e. measure of which is 0. 

3) A is a subset of some inaccessible set A, of measure 0 in some r.e. measure pu not 
concentrated on a countable set (u(-B)>0, if B ts countable). 

4) M(A)=0. 

Proof: 1)=4) and 4)=1) follows from Theorems 7 and Lemma 3 respectively. It is obvious that 
F(M), the image of M at an arbitrary recursive mapping F':0—0), is an r.e. semimeasure and hence 
F(M)-<M. Therefore, if M(A)=0, then Ay=LJF~'(A) is inaccessible and A,, M(A,)=0. This gives 
4)=2)=>3). Lemma 5 implies that 3)=2). Any r.e. semimeasure is the image of an r.e. measure at 
a recursive mapping Q=2 (see [L70], section 3.2). This gives 2)=4). Q.E.D. 

We call negligible the sets having any of these four properties (this neglect is, of course, based 
on our belief in the Independence Principle. We call i-equivalent two sets A and B if their symmetric 
difference is negligible. “Property of Turing degrees” means a set ACC. invariant with respect to 
Turing equivalence. Studying them to within i-equivalence, we can exclude from consideration some 
properties of “exotic” “unreal” degrees which will simplify the study. We denote by K the Boolean 
algebra of Borel sets of Turing degrees, and by L - its factor-algebra with respect to the i-equivalence. 
If ACK, then ACL is the element generated by A, i.e. the i-equivalence class containing A. 
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4.2 TYPES OF TURING DEGREES 


In (1.5) the concept of “sequence completeness” was considered. The set of incomplete sequences 
has a property very close to negligibility. Namely, Item 2) in Proposition 11 is obtained from Item 1) 
of Proposition 5 by omitting the word “total”. Thus, incomplete sequences cannot arise in a process 
completable to a total one (in particular, in a process with the working time bounded by a total 
recursive function). It is natural to consider properties of the Turing degrees generated by complete 
sequences, It turns out that they are organized quite simply. Only four of them are not equivalent. 


Theorem 12: Let ACQ be the closure with respect to Turing- equivalence of a Borel set of 
complete sequences. Then A ts i-equivalent to one of the four sets: 

a) The empty set; 

b) The set of recursive sequences; 

c) The set of all complete sequences; 

d) The set of all complete non-recursive sequences. 


Thus, the properties of a complete sequence (to within i-negligible sets) depend only on its 
recursiveness, and these sequences form the two most natural elements (atoms) of the algebra L. 


Proof: As it follows from Lemma 5, any set A of non-recursive sequences, invariant with respect 
to Turing equivalence, either is of measure 0 at any recursive measure yp, or (for any 4) contains 
p-almost all non-recursive sequences. Then, by virtue of Theorem 8, a ‘+ exists such that all the 
complete non-recursive sequences either from A, or from the complement of A, respectively, depend 
on ¥y. Taking into account that the invariant set A contains either all recursive sequences, or none, 
we obtain that A is i-equivalent to one of the four sets, mentioned in Theorem 12. Q.E.D. 


Let us make a few notes about the rest of Turing degree types (containing no complete sequence). 
Even the proof that their union is not a negligible set turns out to be very non-trivial. It has been 
given by V. V’jugin [L77] who proved that L contains an infinitely divisible element and a countable 
number of atoms. Only two of them (namely, b) and d) of Theorem 12) contain complete sequences. 
V’jugin’s constructions are very complicated. A portion of his proofs has not been published yet. 
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5. BRIEF REFERENCES AND THE BIBLIOGRAPHY 


The following remarks do not clairn to present the history of the question and concern mainly the 
works directly used above. The algorithmic information theory originated with A.N. Kolmogorov’s 
and R.J. Solomonoff’s algorithmic approach to the concepts of information, randomness and a priort 
probability. This idea was based on the fundamental discovery of an optimal (up to an additive 
constant) coding method for constructive objects and a recursively invariant concept of complexity 
arising from it (cf. [K65, S64]). “Uspekhi Mat. Nauk” reports about Kolmogorov’s talks on this 
subject for the Moscow Mathematical Society in 1961 and consecutive years. Some R.J. Solomonoff’s 
ideas in the field were also given in preprints and in [M62]. See also [Ma64] and [Ch66]. 


However, in spite of the depth of the main idea, the accuracy of the mathematical expression of 
the basic quantities was not perfect. Many important relationships hold only with an error degree such 
as the logarithm of complexity. This error rate is of course negligible in comparison to the complexity 
itself, but it can exceed such derived quantities as mutual information, deficiency of randomness, or 
conditional a priort probability. This is connected with the fact that subtraction and division are 
used in the expression of the latter quantities. Thus, the main terms of the degree of complexity 
can be annihilated and only terms smaller than the logarithm of the main ones remain. Therefore, 
these errors distorted the picture very much and hindered the development of a transparent theory. 


With respect to the concept of randomness, these difficulties were overcome in very important 
work [ML66]. But the concept of random sequences proposed there was related only to recursive 
measures and did not cover other important cases. Son 2 other difficulties were overcome in [L70] 
where we introduced the concepts of the universal measure as the a priort probability and complexity 
as its logarithm. Very interesting studies of randomness concept were made by C.P. Schnorr [Sc71]. 


For the concept of information, the problem of giving a precise definition proved to be more 
difficult. The first non-trivial results were obtained by A.N. Kolmogorov and L.A. Levin in 1967. 
The initial definition of the mutual information [K65] was non-symmetric and had monotonicity 
only over one of the arguments. Kolmogorov and Levin [K68, L70] demonstrated that this value 
coincides approximately (up to a logarithm of the complexity) with a symmetric expression and 
therefore is approximately monotonic over its second argument as well. This yields the intuitive, 
and theoretically desirable property that a given text contains not less information about any given 
pair of texts than about either of them. 


In {L70] the universal measure was introduced. Its logarithm (equal to the length of the shortest 
code over the optimal! self-delimiting algorithm) turned out to be a more satisfactory complexity 
measure on WN than the original proposal from [K65]. It allowed improvement of the definitions 
of randomness ({L73]) and information ({L74]). The new definition of information was monotonic 
with a constant (instead of logarithmic) error and can be extended to the case of infinite sequences. 
This work is connected with the very subtle and non-trivial results of P. Gacs [G74] concerning the 
differences between the symmetric and the asymmetric expressions for information. A number of the 
results of [K68, L70, L73, L74, G74] were rediscovered independently by G. Chaitin in his famous 
work [Ch75]. Versions of some results of the present work were reported in [L74, L76, L77]. 
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